Achieving Better Compression Applying Index-based Byte-Pair Transformation before Arithmetic Coding

نویسندگان

  • Jyotika Doshi
  • Savita Gandhi
چکیده

Arithmetic coding is used in many compression techniques during the entropy encoding stage. Further compression is not possible without changing the data model and increasing redundancy in the data set. To increase the redundancy, we have applied index based byte-pair transformation (BPT-I) as a pre-processing to arithmetic coding. BPT-I transforms most frequent byte-pairs (2-byte integers). Here, most frequent byte-pairs are sorted in the order of their frequency and groups consisting of 256 byte-pairs are formed. Each bytepair in a group is then encoded using two tokens: group number and the location in a group. Group number is denoted using variable length prefix codeword; whereas location within a group is denoted using 8-bit index. BPT-I is designed to be applied on any type of source; not necessarily text. More the number of groups considered during transformation, better is the compression. Experimental results have shown around 4.30% additional reduction in compressed file size when arithmetic coding is applied after byte-pair data transformation BPT-I. General Terms Data Compression, Algorithms

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Byte Pair Transformation using Zero-Frequency Bytes with Varying Number of Passes

Byte pair encoding (BPE) algorithm was suggested by P. Gage is to achieve data compression. It encodes all instances of most frequent byte-pair using zero-frequency byte in the source data. This process is repeated for maximum m possible number of passes until no further compression is possible, either because there are no more frequently occurring byte pairs or there are no more unused zero-fr...

متن کامل

Quad-Byte Transformation using Zero-frequency Bytes

Byte pair encoding (BPE) algorithm was suggested by P. Gage is to achieve data compression. It encodes all instances of most frequent byte-pair using zerofrequency byte in the source data. This process is repeated for maximum m possible number of passes until no further compression is possible, either because there are no more frequently occurring byte pairs or there are no more unused zero-fre...

متن کامل

Context-Based Arithmetic Coding for the DCT: Achieving high compression rates with block transforms and simple context modeling

Recent image compression schemes have focused primarily on wavelet transforms, culminating in the JPEG-2000 standard. Block based DCT compression, on which the older JPEG standard is based, has been largely neglected because wavelet based coding methods appear to offer better image quality. This paper presents a simple compression algorithm that uses arithmetic coding on the bit-planes of the D...

متن کامل

Data Compression Modelling: Huffman and Arithmetic

The paper deals with formal description of data transformation (compression and decompression process). We start by briefly reviewing basic concepts of data compression and introducing the model based approach that underlies most modern techniques. Then we present the arithmetic coding and Huffman coding for data compression, and finally see the performance of arithmetic coding. And conclude th...

متن کامل

Efficient modification of LZSS compression algorithm

This paper presents a new method of lossless data compression called LZPP, being an advanced modification of the well-known algorithm LZSS [1]. It introduces improvements of the LZ family algorithms [2, 3], such as the use of a special coding of two and three byte matches, use of an auxiliary entropy coder and new criteria of symbol exclusions. Minimization of the data compression ratio (bpc) h...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014